Good Setup for our use-case

Hi,
we are currently thinking about changing how we use PagerDuty and need some input on how we can achieve what we want.

Current state:
We have a few different services that describe different environments of our product (staging, production-eu, production-us, global-infrastructure). We have one team with a on-call schedule that rotates on a weekly basis for all different “services”. Each alerting tool will report, based on the environment to the right PagerDuty service.

What we want to have:
We want the current team and schedule to be only responsible for production alerts. For alerts in staging we want to integrate our development teams so they will get alerts based on their responsibility for microservices.

Example:
Team A (Dev-Team): microserviceA, microserviceB
Team B (Dev-Team): microserviceC, microserviceD
Team C (Dev-&SRE-Team): microserviceE, microserviceF

Alert of microserviceA in staging should be routed to Team A
Alert of microservice C in production-eu shold be routed to Team C

We want to do this because teams should be sensible to alerts in our development environment (staging), currently Team C has to forward each alert in staging manually within our organizations chat tool. This does not scale and prevents the Dev-Teams from taking responsibility for their services.

Problem we are facing:
We are unsure how to setup the structure in PagerDuty. Defining a service per microservice seams to be the right approach, but we are currently unsure how we can route alerts of different environments to different teams.

Thanks in advance for any help!
Best, Jakob

Hi Jakob,

Thanks for reaching out on our community page. Your explanation sounds good, and very achievable, the best way to do this however would be to use our Rulesets function.

You would need to create micro-services and have corresponding Escalation Policies and schedules for those teams that need to be alerted about the Staging environment for example. but you can use rulesets to create events rules so the incoming events are directed to the right team. This has the benefit of having all events triggered with one routing key.

Hope this helps, let us know if you have any more questions.

John

Jakob,

I would encourage you to broaden your thinking a bit. What are the business aligned products/offerings/services/applications that those microservices enable or support? I would think about adding a few more layers of business and/or technical services in PagerDuty that helps you not only mobilize the right team(s) but also helps you understand the impact on your business and/or customers, where those impacts are, how severe they may be, and position yourself for the future with a well configured PagerDuty foundation.

Then you can start to onboard alerts and route them into the right PagerDuty Technical Service and related teams using this new context. The key to success with this is that each and every one of yoru incoming monitoring events/alerts contains rich metadata, tabs, labels, structured host/node name, etc. so you can easily identify what the event/alert is from, what it impacts, and what PagerDuty Technical Service you should route it to. In your example, I’d want to see each event/alert contain something lnike this: “Business Service: eCommerce, Business Application: Web Commerce, Function/Microservice: Check Inventory, Environment: Production” and then I’d create a rule in my ruleset for the eCommerce Application (or team) to route that event to the eCommerce:Web:Check Inventory:Dev service notifying the on-call dev team (using low urgency notifications :slight_smile: ).

Make sense?

Doug

Thank you for your replies!